Prediction of postoperative infectious complications in elderly patients with colorectal cancer: a study based on improved machine learning

Background Infectious complications after colorectal cancer (CRC) surgery increase perioperative mortality and are significantly associated with poor prognosis. We aimed to develop a model for predicting infectious complications after colorectal cancer surgery in elderly patients based on improved machine learning (ML) using inflammatory and nutritional indicators. Methods The data of 512 elderly patients with colorectal cancer in the Third Affiliated Hospital of Anhui Medical University from March 2018 to April 2022 were retrospectively collected and randomly divided into a training set and validation set. The optimal cutoff values of NLR (3.80), PLR (238.50), PNI (48.48), LCR (0.52), and LMR (2.46) were determined by receiver operating characteristic (ROC) curve; Six conventional machine learning models were constructed using patient data in the training set: Linear Regression, Random Forest, Support Vector Machine (SVM), BP Neural Network (BP), Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost) and an improved moderately greedy XGBoost (MGA-XGBoost) model. The performance of the seven models was evaluated by area under the receiver operator characteristic curve, accuracy (ACC), precision, recall, and F1-score of the validation set. Results Five hundred twelve cases were included in this study; 125 cases (24%) had postoperative infectious complications. Postoperative infectious complications were notably associated with 10 items features: American Society of Anesthesiologists scores (ASA), operation time, diabetes, presence of stomy, tumor location, NLR, PLR, PNI, LCR, and LMR. MGA-XGBoost reached the highest AUC (0.862) on the validation set, which was the best model for predicting postoperative infectious complications in elderly patients with colorectal cancer. Among the importance of the internal characteristics of the model, LCR accounted for the highest proportion. Conclusions: This study demonstrates for the first time that the MGA-XGBoost model with 10 risk factors might predict postoperative infectious complications in elderly CRC patients.


Introduction
Due to the progress of population aging and the characteristics of intestinal cell susceptibility in the elderly [1], the proportion of colorectal cancer patients aged 65 or over is as high as 70% [2].At present, surgery is the cornerstone of colorectal cancer treatment.Relevant data show that the age of patients undergoing intestinal surgery is gradually increasing [3].However, with the aging process, the organ function and immune function of elderly people over 65 years old decrease, accompanied by more basic diseases.Moreover, elderly patients often have poor nutritional absorption after surgery, poor recovery after invasive treatment, and weak resistance to pathogens, so they are prone to postoperative infection.Therefore, in this study, we pay particular attention to the elderly population to improve the prediction accuracy of this population.
Postoperative infectious complications will increase patient costs, and length of hospital stay, and delay the start time of postoperative adjuvant therapy [4].It is more important that many pieces of evidence show that postoperative infectious complications are significantly associated with poor prognosis of CRC [5,6].If the postoperative infectious complications of elderly patients can be predicted early, the survival quality and prognosis of patients can be improved by the timely use of prophylactic antibiotics and early goal-directed therapy.At present, most of the studies only focus on the effect of individual markers on the prediction of postoperative infection.In this paper, we comprehensively consider the influence of various predictive factors of infectious complications: peripheral blood platelet/ peripheral blood lymphocyte (PLR) [7], peripheral blood lymphocyte/peripheral monocytes (LMR) [8], peripheral blood neutrophil/ peripheral blood lymphocyte (NLR) [9] lymphocyte/Creactive protein (LCR) [10], prognostic nutritional index (PNI) [11] on postoperative infection.It has been reported that these factors can predict the incidence of infectious complications in different types of cancer.
Many researchers have attempted to predict the infectious complications following colorectal surgery by using prediction models, which include various clinicopathological factors.These models rely on traditional statistical analysis, such as logistic risk regression, Cox risk regression, and nomogram.Compared with traditional statistical analysis, the advantage of machine learning is that it can capture complex nonlinear relationships from a series of complex medical data sets, and use data to continuously adapt to improve the accuracy, sensitivity, and specificity of the prediction model [12,13].However, some medical personnel may not realize that the traditional ML model has overfitting.Therefore, this study improves the XGBoost algorithm (MGA-XGBoost) based on the moderate greedy (MGA) algorithm to improve the accuracy of the prediction model.

Data sources
In this paper, ' colon cancer 'and 'rectal cancer 'as keywords to retrieve the medical record system of the Third Affiliated Hospital of Anhui Medical University.The clinical data of patients with colorectal cancer confirmed by postoperative pathology after radical operations in gastrointestinal surgery from March 2018 to April 2022 were retrospectively collected.Inclusion criteria:1) Age ≥ 65 years old;2)The patient was diagnosed with colorectal cancer and underwent radical resection of colorectal cancer;3)There was no history of radiotherapy and chemotherapy before the operation, no distant organ metastasis, and the postoperative pathological stage was 0, I, II and III;4)No other malignant tumors were found; 5)Complete clinicopathological data; Excluded criteria:1) Age < 65 years; 2)Incomplete clinical data; 3)Patients with acute and chronic infectious diseases and longterm use of immunosuppressive agents before operation;4)Preoperative radiotherapy and chemotherapy or with distal metastasis;5)Postoperative new non-surgical related diseases;6)Emergency surgery for colorectal cancer with intestinal obstruction;7)Patients who cannot accurately assess postoperative complications without doctor's advice discharge; Preoperative and intraoperative variables were collected for screening of risk factors.Information on the following 25 variables was obtained: age, sex, body mass index (BMI), ASA, smoking status, Previous comorbidities (chronic lung disease, diabetes), surgical methods, intraoperative blood transfusion, presence of stomy, laboratory examination data: within 7 days before surgery(lymphocytes, C-reactive protein, soterocyte, albumin, monocytes, white blood cells, hematocrit, international normalized ratio, fibrinogen, total bilirubin, direct bilirubin, aspartate aminotransferase (AST), blood urea nitrogen (BUN), creatinine, uric acid, Na + 、Ca+), Tumor information: pathological T-stage(T-stage), pathological N-stage (N-stage), pathological stage, tumor location, tumor size; Operation information: intraoperative bleeding, operation time.

Postoperative infection
The common postoperative infectious complications were observed, including respiratory and pulmonary infection, incision infection, anastomotic leakage, abdominal abscess, urinary tract infection, etc.The diagnostic criteria of infection refer to the corresponding guidelines and standard references [5,14,15].Briefly, as follows:1) Incision infection: Skin and subcutaneous tissue infection within 30 days after surgery, wound redness, swelling, heat, pain, local incision drainage pus;2) anastomotic leakage: Clinical signs of peritonitis such as tenderness, rebound pain, and muscle tension were observed.Color Doppler ultrasound showed gas and liquid around the anastomosis, or CT showed anastomotic disconnection;3) Abdominal abscess: Abdominal space infection occurred within 30 days after the operation, manifested as abdominal pain, persistent fever, and other symptoms, confirmed by puncture or B-ultrasound and improved after surgical drainage or anti-infective treatment;4) Uinary tract infection: Cystitis and urethritis occurred within 30 days after the operation.Bladder irritation symptoms such as frequent urination, urgency, and dysuria occurred clinically.A routine urine examination showed pyuria and hematuria.Pathogenic bacteria were cultured in urine;5) Pulmonary infection: The patient presented with body temperature > 38.0 °C, elevated white blood cell count, cough, expectoration, and other clinical symptoms.Dry and wet rales were heard in the lungs, and a chest X-ray showed new invasive lesions.

Conventional statistical analysis
SPSS24.0 software was used to process and analyze the data.The optimal cutoff values of NLR, PLR, PNI, LCR, and LMR were determined by the receiver operating characteristic curve, as shown in Table 1.There is no uniform standard for the study of the five optimal cutoff values determined by AUC.The cut-off values of PNI ranged from 40.1 [16] to 51.26 [17], the cut-off values of LCR ranged from 0.34 [18] to 0.84 [19], the cut-off values of NLR ranged from 1.93 [17] to 4.8 [20], the cut-off values of PLR ranged from 190.83 [21] to 645.22 [17], and the cut-off values of LMR ranged from 2 [22] to 3.6 [23], which were consistent with the results of this study.In the univariate analysis, Continuous variables (such as Body Mass Index) were reported as mean ± standard deviation and analyzed using the U test to assess the significance level between the infected group and the non-infected group.Count data in univariate analysis were expressed by rate or the number of cases, and the χ2 test was used between groups.Factors with statistical significance in single factors were included in the machine learning model.P < 0.05 was considered statistically significant.We continuous variables were normalized based on the mean and SD of the training set.Categorical variables were encoded into binary variable, 1 represents having an incident, 0 represents not having an incident.Gender was also encoded, 1 represents male, 0 represents female.Overfitting may occur in the process of model training, thus destroying the performance of the model.Therefore, we first perform single factor analysis to filter out features that are not statistically significant, and then introduce the recursive feature elimination (RFE) method of random forest.This method first trains all features, then recursively removes the least important features, and selects the feature set with the highest recall score [24].

Machine learning
In this paper, python3.9 was used to construct various machine learning models (Linear Regression, Random Forest, SVM, BP, LGBM, XGBoost) to predict postoperative infectious complications in elderly patients with colorectal cancer.Except for XGBoost, the other five models are built by installing the scikit-learn package in python3.9.The data of 512 elderly patients were randomly divided into a 70% training set and a 30% validation set.The training set data is used to develop the prediction model, and the validation set data is used to verify the performance of the model.The performance of the model was evaluated by the AUC, ACC, recall, F1-score, and precision.

Development of optimization algorithm
The use of the XGBoost model often faces two major problems:1) When the XGBoost model is used for prediction, there are many parameters to be adjusted, and the process of parameter adjustment is tedious.It is difficult to select the best parameters for the current problem;2) The XGBoost model applied to the idea of Gradient Boosting has the risk of overfitting; Therefore, this paper uses Greedy Algorithm (GA) to adjust the parameters; However, the GA algorithm also has some shortcomings in the context of the current problem.For example, the result of the previous iteration will directly affect the result of the next iteration, resulting in a fallacy.Then the greedy algorithm will cause a large error in the final result.Therefore, this paper proposes a Moderate Greedy Algorithm (MGA) to remedy and correct this.MGA is consistent with GA in solving the problem and will make a better choice in the current state and gradually construct the optimal solution.MGA is actually to introduce the principle of moderation based on GA thought, restrain the greedy range, avoid excessive greed, and lead to the accumulation of errors, resulting in a large error in the final result.The optimal results can be obtained by selecting the appropriate moderate principle, and the weighted ensemble learning method is used to increase its robustness.In this paper, the MGA algorithm is used to adjust the max_depth, min_child_weight, gamma, subsample, colsample_bytree, reg_alpha, reg_lambda parameters of XGBoost.The parameters are grouped in a greedy idea and optimized step by step, and e-ach time does not only depend on the optimal parameter subset but select several optimal parameter subsets.If the seven parameters are optimized by grid search, it not only has a large amount of calculation, but also limits the range of each parameter.Therefore, we use a greedy method to group the parameters and optimize them step by step, and each time we do not only depend on the optimal parameter subset, but also select several optimal parameter subsets (so the algorithm is called 'MGA').The main operation details are shown in Table 2.
The value range of parameter adjustment is shown in Table 3: Based on the idea of greedy algorithm, we divide the parameter adjustment process of XGBoost into six steps.Under the condition of local optimal parameters obtained after each step of parameter adjustment, the next step is to optimize other parameters.And so on until all parameters are adjusted.
The main idea of boosting algorithm is to combine multiple weak learners with high deviation to reduce the overall deviation and form a strong learner.we worry that if a single XGBoost is used, it will perform poorly in modeling.In order to avoid the risk of overfitting due to inconsistent data distribution and small data sample size, we use the integration of XGBoost to increase the robustness of the model.In the process of parameter adjustment, not only the optimal set of parameters is taken, but several sets of better parameter models are selected.Steps of parameter adjustment: ① First adjust the two sets of parameters of max_depth and min_child_weight, and select the two sets of parameters with the best score.② Secondly, the gamma parameter is adjusted to retain the optimal two sets of data.③ Then adjust the two sets of parameters of subsample and colsample_bytree, and select the optimal two sets of data.④ Then, the parameters of the two sets of regular coefficients reg_alpha and reg_lambda are adjusted to select the optimal set of data.⑤ Therefore, there are now 2*2*2*1 = 8 sets of data.
Finally, the parameters of learning_rate and num_ boost_round are adjusted to select the optimal set of parameters.
Here in the tuning step also consider divided into different 'step group ', The so-called step group is the nine parameters listed above, which can be randomly divided into several steps, and adjust one or two parameters in each step.For example, the above five-step adjustment can be used as a step group; "the first step: max_depth; the second step: min_child_weight; the third step: gamma; step 4: subsample, colsample_bytree; step 5: reg_ alpha, reg_lambda; the sixth step: learning_rate, num_ boost_round", such parameter group adjustment can be said to be another "step group".In summary, I finally got  a total of 8 sets of optimal XGBoost experimental parameters as follows (Table 4): The eight sets of XGBoost experimental parameter models obtained by the above methods are compared and sorted according to the optimal and sub-priority of the parameters, and then weighted ensemble learning is performed.The optimal allocation weight is 2/3, and the suboptimal allocation weight is 1/3.The number of iterations is set to 500.Therefore, the proportion of the eight groups of parametric models obtained is 0.296, 0.148, 0.

Patient characteristics
From March 2018 to April 2022,563 elderly patients underwent radical resection of colorectal cancer in the gastrointestinal surgery department of our hospital.After exclusion and inclusion criteria screening,512 patients were included in the study.In this completed data set, no variables had missing percentage higher than 1%.We employed mean imputation, which imputed missing value with the mean of each feature, to fill in missing values.Patients with postoperative infectious complications accounted for 24% (n = 125), 70% (n = 358) in the training sets, and 30% (n = 154) in the validation set.There were 295 male patients (57.62%) and 217 female patients (42.38%).The characteristics of the data set are shown in Table 5.

Feature selection using univariate and recursive feature elimination methods
To better understand the data characteristics of the model, the patients were divided into an infected group and a non-infected group according to the training set and validation set, and then the data were analyzed by single factor analysis.Since less relevant features may have a negative impact on the performance of machine learning models, we further use the recursive feature elimination (RFE) method to select features and rank the importance of features.The univariate and RFE methods are used for feature selection to reduce 36 features to 10 features.These 10 features were ASA, operation time, diabetes, presence of stomy, tumor location, NLR, PLR, PNI, LCR, and LMR (P < 0.05).The results of single factor analysis are shown in Table 6, and the feature ranking of RFE method is shown in Fig. 1.

Correlation analysis between risk factors
To better see whether there is a correlation between risk factors, this paper analyzes the correlation of statistically significant indicators in RFE methods.The results showed that there was a high correlation between PNI and LMR (0.71)、NLR and PLR (0.35).The detailed results are shown in Fig. 2.

Performance evaluation of machine learning models for predicting postoperative infectious complications
To evaluate the predictive effect of seven machine learning models on postoperative infectious complications in elderly patients.The results showed that the AUC value of the MGA-XGBoost prediction model was the highest (0.862), and Linear Regression, SVM, and BP all showed general predictive ability (the AUC range was 0.6 ~ 0.73).
The AUC value of each model is shown in Fig. 3.
In addition to AUC, this paper also introduces ACC, Recall, F1-score, and Precision to evaluate the performance of various prediction models.It can be seen from Table 7 that MGA-XGBoost, LGBM and XGBoost all show good accuracy and precision.

Feature importance analysis of MGA-XGBoost model
In this paper, the importance of internal features in the verification data set of the MGA-XGBoost prediction model with the highest accuracy is visually displayed by three methods of cover, weight and gain.Visualized mathematical publicity is: Where S is the total score of the three methods of each feature, i is the score of each independent feature, i cover , i weigh t and i gain are the scores of each independent feature.)As shown in Fig. 4, LCR, diabetes and operation time ranked first, second and third respectively.

Discussion
This research based on clinical data and machine learning methods has the following main contributions:1) The first study found that 10 factors were significantly associated with infectious complications after colon cancer surgery: ASA, operation time, diabetes, tumor location, presence of stomy, NLR, PLR, PNI, LCR, and LMR;2) The second study constructed a conventional predictive model for postoperative infectious complications in elderly patients with colorectal cancer.The results showed that the LGBM model performed best The role of systemic inflammatory response and nutritional status in cancer patients is increasingly recognized [25].For example, systemic inflammatory response indicators and nutritional indicators can be used to predict infectious complications after malignant tumor surgery [11,26].Okugawa [10] found that low preoperative LCR was an independent risk factor for surgical site infection in patients with colorectal cancer.Because cancer status usually activates systemic inflammatory responses, invasive surgery triggers abnormally enhanced inflammatory responses that reduce patient immunity [27].Consistent with our study, preoperative NLR and PLR levels increased, and LCR and LMR levels decreased, suggesting a higher risk of postoperative infectious complications.It is worth noting that LCR ranks first in the importance ranking of internal features of MGA-XGBoost model.Okita [28] pointed out that low PNI may be a significant predictor of postoperative infectious complications in patients with ulcerative colitis undergoing proctectomy with ileal pouch-anal anastomosis.Cancer patients occasionally have impaired nutritional intake during the perioperative period [29].Malnutrition can also lead to the decline of immune function in cancer patients [30], especially hypoproteinemia has a significant effect on humoral immunity, which can cause pathogen translocation, conditional pathogen transformation, and fungal reproduction [31].Studies have shown that immune nutrition and special enteral formula can reduce the incidence of postoperative infectious complications in patients with colorectal cancer surgery [32].The results of this study showed that with preoperative PNI < 48.48, the incidence of postoperative infectious complications increased.Therefore, this study showed that inflammatory response and nutritional indicators were significantly associated with postoperative infection.At the same time, this study determined five comprehensive inflammatory indicators related to postoperative infection of colorectal cancer by single factor analysis and RFE method.According to different literature reports, these risk factors were significantly associated with postoperative infection [33,34].In the era of rapid rehabilitation surgery, it is important to use these markers for early prediction of infection, and early diagnosis to avoid readmission and reduce medical costs.ML refers to the iterative and automatic optimization of mathematical models to gradually and accurately fit available data [35].There are thousands of machine learning algorithms, but each model has its limitations and the best algorithm is uncertain in different situations [36].The best model usually depends on the sample data set and analysis purpose in a specific scenario [37].For example, the BP model in this study has the lowest accuracy, which may be because BP transforms the characteristics of all problems into numbers and all reasoning into numerical calculations, resulting in the loss of information in its results [38].Therefore, in this study, we calculated the prediction accuracy of six conventional machine learning models and compared their performance, among which the LGBM model showed the best prediction ability.
LGBM prevents the model from falling into the local optimal solution by pruning and uses the second derivative to use the sampling method in each iteration to prevent overfitting [39].Therefore, LGBM has the best overall performance in the conventional machine learning model for predicting postoperative infection of colorectal cancer, with an AUC of 0.833, an accuracy of 0.844, and a precision of 0.708.Most clinicians usually use standard statistical software packages (such as R) to develop some machine learning methods, but standard software packages cannot make up for the shortcomings of machine learning itself.For example, XGBoost performs well in various ML competitions, but it usually has problems with many parameters and cumbersome adjustments.Therefore, some scholars have studied the improvement of XGBoost.In 2021, Peng [40] constructed a new model for predicting hypertension based on hybrid feature selection and standard XGBoost.The new model is about 7% higher than the AUC and the accuracy of the model is without improvement.Zhang [41] proposed a GA-XGBoost model for diabetes risk prediction.The experimental results show that the prediction accuracy of the GA-XGBoost model is better than that of linear regression, decision tree, support vector machine, and neural network, and the parameter adjustment time is less than that of grid search and random walk.In this study, Python3.9 uses the greedy idea to group the parameters and tune them step by step.Each time several parameter subsets are selected, and the final model is obtained by weighting.After training on 70% of the full data set, MGA-XGBoost increased AUC by 7.4% in the 30% data set test.Therefore, the improved XGBoost model established in this study can help clinicians make the best prediction.This study shows that advances in artificial intelligence and machine learning will positively improve the performance of clinical predictive models.
Although complex algorithms such as XGBoost, support vector machine, and artificial neural network are increasingly popular and widely used in predictive modeling, they are based on a 'black box 'design and are difficult to explain and apply in clinical practice [42].Clinicians should require the transparency and interpretability of the algorithm so that artificial intelligence can be responsible for its predictions and recommendations.However, the improvement of model interpretability cannot be at the expense of accuracy.Our main goal is to construct a more accurate, interpretable, and robust ML model for postoperative infection in elderly patients with colorectal cancer.Therefore, this study the importance of internal features in the verification data set of the MGA-XGBoost prediction model with the highest accuracy is visually displayed by three methods of cover, weight and gain.By opening the internal structure of the MGA-XGBoost model, the priority of these features in this study is distinguished.This method is superior to other previously published opaque machine learning models.This provides an important basis for the clinical In this study, the interior of the MGA-XGBoost model shows the importance of blood glucose indicators.In the comparative correlation analysis, the correlation between blood glucose and postoperative infectious complications was only 0.23, so it may be missed in routine analysis.Postoperative hyperglycemia is a common perioperative stress response [43].Marks [44] believed that perioperative blood glucose in diabetic patients should be stable at 6.67-10.0mmol/L, and blood glucose greater than 13.9 mmol/L and less than 4.8 mmol/L are unfavorable to patients.At the same time, Nakamura et al. [45] pointed out that even under strict perioperative blood glucose control, diabetes is directly related to the increased risk of surgical site infection.It shows that there is an internal relationship between diabetes and surgical partial infection, not just diabetes-related hyperglycemia.It may be due to metabolic disorders such as sugar and protein in diabetic patients, resulting in reduced white blood cell bactericidal capacity and reduced production of immunoglobulins and antibodies, resulting in low immunity.In addition, elderly patients with a longer duration of diabetes are prone to vascular neuropathy, resulting in slow blood flow and reduced tissue oxygen supply, which is conducive to the growth of fungi and anaerobic bacteria, so they are more prone to postoperative infection than non-diabetic patients.Therefore, medical staff should strictly control the blood glucose level of diabetic patients, and continue to use insulin during the perioperative period to avoid excessive blood glucose fluctuations.
In the study of inflammatory response indicators, Okugawa [10] compared the predictive ability of LCR, CAR, NLR, PLR, and other inflammatory indicators.The results showed that LCR had the highest correlation with colorectal cancer recurrence and was a more reliable biomarker.It may be because preoperative CRP is associated with lymphopenia and T lymphocyte reaction cell damage in patients with colorectal cancer [46], and lymphocytes play a key role in the host's cytotoxic immune response to tumors, which impairs cellmediated immunity in patients with colorectal cancer.In the MGA-XGBoost model, LCR is the best predictor of postoperative infection in colorectal cancer compared with other inflammatory indicators.
For patients with longer operation time, the operation will increase the exposure time of the surgical site tissue, so the more chance of contamination.Mik et al. [47] found that a total operation time of more than 180 minutes increases the risk of surgical site infection in deep incisions and organ spaces.At the same time, the longer the operation time, the greater the possible trauma and the more blood loss, which further reduces the patient 's resistance and makes the patient more prone to infection.It is suggested that for patients with long expected operation time, a detailed surgical plan should be formulated before operation, so as to shorten the operation time as much as possible while ensuring the quality of operation, and at the same time, secondgeneration antibiotics should be given appropriately for prevention and control.This model can not only explain the relationship between features and risk factors but also predict the importance of features for individuals.If the model is prospectively validated, it can help clinicians determine which part of the intervention is most  important, thus providing an interpretable and powerful tool for preventing postoperative infection.This study has several remarkable limitations.First of all, the training sample size is limited, because the queue only comes from one center, which may lead to over-fitting of the model.In the future, multi-center research is needed for external verification.Secondly, this study is retrospective, and there may be collection and input bias and inevitable selection bias.For example, the incidence of postoperative anastomotic leakage is extremely low in our study.To improve the performance of artificial intelligence models, the models established in this study will eventually be applied to other medical sites to verify their scalability.
In summary, our study demonstrates for the first time that the MGA-XGBoost model with 10 risk factors can predict postoperative infectious complications in elderly CRC patients.At the same time, combining risk prediction with feature importance analysis allows clinicians to assess postoperative risks and potentially modifiable drivers.• thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ?Choose BMC and benefit from:

Fig. 1
Fig. 1 Feature importance ranking of the selected 10 features illustrated by random forest

Fig. 2
Fig. 2 Correlation analysis between risk factors

Fig. 3
Fig. 3 ROC curve for predicting postoperative infectious complications on the validation set

Table 1
The best cut-off values of the five indicators

Table 2
The value process of MGA

Table 3
Range of XGBoost parameters

Table 4 8
groups of XGBoost model parameter results

Table 5
characteristics of the study patients(n = 512)

Table 6
Data characteristics analysis of the infected group and non-infected group(n = 512)

Table 7
The performance of 7 ML models in the validation set Page 14 of 14 Tian et al.BMC Medical Informatics and Decision Making (2024) 24:11